A Plethora of Methods for Learning English Countability
نویسندگان
چکیده
This paper compares a range of methods for classifying words based on linguistic diagnostics, focusing on the task of learning countabilities for English nouns. We propose two basic approaches to feature representation: distribution-based representation, which simply looks at the distribution of features in the corpus data, and agreement-based representation which analyses the level of tokenwise agreement between multiple preprocessor systems. We additionally compare a single multiclass classifier architecture with a suite of binary classifiers, and combine analyses from multiple preprocessors. Finally, we present and evaluate a feature selection method.
منابع مشابه
Learning the Countability of English Nouns from Corpus Data
This paper describes a method for learning the countability preferences of English nouns from raw text corpora. The method maps the corpus-attested lexico-syntactic properties of each noun onto a feature vector, and uses a suite of memory-based classifiers to predict membership in 4 countability classes. We were able to assign countability to English nouns with a precision of 94.6%.
متن کاملReinforcing English Countability Prediction with One Countability per Discourse Property
Countability of English nouns is important in various natural language processing tasks. It especially plays an important role in machine translation since it determines the range of possible determiners. This paper proposes a method for reinforcing countability prediction by introducing a novel concept called one countability per discourse. It claims that when a noun appears more than once in ...
متن کاملThe Ins and Outs of Dutch noun countability classification
This paper presents a range of methods for classifying Dutch noun countability based on either Dutch or English data. The classification is founded on translational equivalences and the corpus analysis of linguistic features which correlate with particular countability classes. We show that crosslingual classification on the basis of word-to-word or featureto-feature mappings between English an...
متن کاملUsing an Ontology to Determine English Countability
In this paper we show to what degree the countability of English nouns is predictable from their semantics. We found that at 78% of nouns’ countability could be predicted using an ontology of 2,710 nodes. We also show how this predictability can be used to aid non-native speakers to determine the countability of English nouns when building a bilingual machine translation lexicon.
متن کاملCrosslingual Countability Classification: English meets Dutch
This paper presents a range of methods for classifying Dutch nouns as countable, uncountable or plural only based on both Dutch and English data. The classification is based on the occurrence of countability specific linguistic features that are extracted from unannotated corpora. We show that in the absence of reliable Dutch gold standard data, cross-linguistic classification can be achieved o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003